Search CORE

7 research outputs found

Interpreting patient case descriptions with biomedical language models

Author: Alghanmi Israa
Publication venue
Publication date
Field of study

The advent of pre-trained language models (LMs) has enabled unprecedented advances in the Natural Language Processing (NLP) field. In this respect, various specialised LMs for the biomedical domain have been introduced, and similar to their general purpose counterparts, these models have achieved state-of-the-art results in many biomedical NLP tasks. Accordingly, it can be assumed that they can perform medical reasoning. However, given the challenging nature of the biomedical domain and the scarcity of labelled data, it is still not fully understood what type of knowledge these models encapsulate and how they can be enhanced further. This research seeks to address these questions, with a focus on the task of interpreting patient case descriptions, which provides the means to investigate the model’s ability to perform medical reasoning. In general, this task is concerned with inferring a diagnosis or recommending a treatment from a text fragment describing a set of symptoms accompanied by other information. Therefore, we started by probing pre-trained language models. For this purpose, we constructed a benchmark that is derived from an existing dataset (MedNLI). Following that, to improve the performance of LMs, we used a distant supervision strategy to identify cases that are similar to a given one. We then showed that using such similar cases can lead to better results than other strategies for augmenting the input to the LM. As a final contribution, we studied the possibility of fine-tuning biomedical LMs on PubMed abstracts that correspond to case reports. In particular, we proposed a self-supervision task which mimics the downstream tasks of inferring diagnoses and recommending treatments. The findings in this thesis indicate that the performance of the considered biomedical LMs can be improved by using methods that go beyond relying on additional manually annotated datasets

Online Research @ Cardiff

Probing Pre-Trained Language Models for Disease Knowledge

Author: Alghanmi Israa
Espinosa-Anke Luis
Schockaert Steven
Publication venue
Publication date: 14/06/2021
Field of study

Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably.Comment: Accepted by ACL 2021 Finding

arXiv.org e-Print Archive

Online Research @ Cardiff

Self-supervised intermediate fine-tuning of biomedical language models for interpreting patient case descriptions

Author: Alghanmi Israa
Espinosa-Anke Luis
Schockaert Steven
Publication venue: International Committee on Computational Linguistics
Publication date: 31/10/2022
Field of study

Online Research @ Cardiff

Probing pre-trained language models for disease knowledge

Author: Alghanmi Israa
Espinosa-Anke Luis
Schockaert Steven
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 31/08/2021
Field of study

Online Research @ Cardiff

Interpreting patient descriptions using distantly supervised similar case retrieval

Author: Alghanmi Israa
Espinosa-Anke Luis
Schockaert Steven
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/03/2022
Field of study

Biomedical natural language processing often involves the interpretation of patient descriptions, for instance for diagnosis or for recommending treatments. Current methods, based on biomedical language models, have been found to struggle with such tasks. Moreover, retrieval augmented strategies have only had limited success, as it is rare to find sentences which express the exact type of knowledge that is needed for interpreting a given patient description. For this reason, rather than attempting to retrieve explicit medical knowledge, we instead propose to rely on a nearest neighbour strategy. First, we retrieve text passages that are similar to the given patient description, and are thus likely to describe patients in similar situations, while also mentioning some hypothesis (e.g.\ a possible diagnosis of the patient). We then judge the likelihood of the hypothesis based on the similarity of the retrieved passages. Identifying similar cases is challenging, however, as descriptions of similar patients may superficially look rather different, among others because they often contain an abundance of irrelevant details. To address this challenge, we propose a strategy that relies on a distantly supervised cross-encoder. Despite its conceptual simplicity, we find this strategy to be effective in practice

Online Research @ Cardiff

Combining BERT with static word embeddings for categorizing social media

Author: Alghanmi Israa
Espinosa-Anke Luis
Schockaert Steven
Publication venue
Publication date
Field of study

Online Research @ Cardiff